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Abstract 

Constraints placed upon the phenotypes of organisms result from their interactions with 
the environment. Over evolutionary timescales, these constraints feed back onto smaller 
molecular subnetworks comprising the organism. The evolution of biological networks is 
studied by considering a network of a few nodes embedded in a larger context. Taking into 
account this fact that any network under study is actually embedded in a larger context, 
we define network architecture, not on the basis of physical interactions alone, but rather 
as a specification of the manner in which constraints are placed upon the states of its 
nodes. We show that such network architectures possessing cycles in their topology, in 
contrast to those that do not, may be subjected to unsatisfiable constraints. This may be a 
significant factor leading to selection biased against those network architectures where such 
inconsistent constraints are more likely to arise. We proceed to quantify the likelihood of 
inconsistency arising as a function of network architecture finding that, in the absence of 
sampling bias over the space of possible constraints and for a given network size, networks 
with a larger number of cycles are more likely to have unsatisfiable constraints placed upon 
them. Our results identify a constraint that, at least in isolation, would contribute to a bias 
in the evolutionary process toward more hierarchical-modular versus completely connected 
network architectures. Together, these results highlight the context-dependence of the 
functionality of biological networks. 


1 Introduction 

Probabilistic models of biological networks serve as a bridge between theory and experiment. On the one 
hand, parameters in a probabilistic model can be fit to data obtained by measuring the levels of each 
variable. For example in gene regulatory networks, gene expression can be measured using microarray or 
sequence census methods [1-3]. On the other hand, one can model a biological network as a deterministic 
or stochastic reaction network which tracks levels of each molecule [4,5]. From the solution to this latter 
kind of model, one can then obtain theoretical predictions for the parameters of the probabilistic model in 
terms of reaction rates. Comparison of the parameters fitted from data with the predicted values serves 
as a means for comparing theory with experiment and can serve as a starting point for improving the 
theory or for designing future experiments [6]. 

An important feature of experimental science is that it involves partial information. In the course of a 
single measurement, one typically is not able to observe a biological network in its entirety. Rather, one 
observes a subnetwork at a time and only obtains a more complete picture by later combining these partial 
views. This contrasts with theory, where, one makes a representation of a closed system that provides 
explicit values for all quantities of interest. In order for a probabilistic model to serve its purpose, it 
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should also accomodate partial information and thus we will explicitly consider the effects of 1) carving 
out a subnetwork from its context and 2) coarse-graining observables. Observables representing partial 
information will generally arise in situations where a system is interacting with another system. This 
situation arises in the context of interpreting the potential existence of modular substructure within 
biological network data deriving from any given organism as well as with respect to the interactions 
between an organism and its environment. 

Inconsistency arises when a network context places more constraints on a subnetwork than it is 
capable of satisfying. The impact of this issue on genetic interactions has been considered previously 
in the context of population genetics [7]. We exhibit a method of checking for such consistency and 
evaluating its likelihood of arising in the context of building probabilistic models of biological networks. 
When apparent inconsistency is observed, it must arise from the network context interacting with only 
partial information of the states of a given subnetwork. This would indicate that information about the 
network context must be included in order to maintain a consistent model of the system. 

In Sec. 2 we describe the relationship between representations of biological networks and an abstraction 
of these referred to as network architecture that indicates the manner in which a subset of a network 
is connected to its context. We explain the connection between stochastic process models of biological 
networks and a generalization of the genotype-phenotype map applying to arbitrary biological networks 
referred to as network-network state maps in Sec. 3. Sec. 4-Sec. 6 contain examples of the underlying 
mathematical justification for our claims (more details of which are provided in Supplementary Material), 
and they can be skipped by readers who are primarily interested in the intuitive implications of our 
analysis. In Sec. 4 we introduce the concept of network modules and define probability distributions 
over their states. Sec. 5 and Sec. 6 describe the different compatibility conditions that arise for different 
biological network architectures and demonstrate how these compatibility conditions lead to a set of 
inequalities determining a space of probability distributions for each network architecture. Sec. 7 and 
Sec. 8 examine these constraints for the example of the three-cycle network architecture. Sec. 9 computes 
the likelihood of unsatisfiable constraints for all biological network architectures on four variables that 
possess cycles. Finally, Sec. 10 explains implications for the evolution of biological network architectures 
of the result that networks with a larger number of cycles are more likely to have unsatisfiable constraints 
placed upon them. 

2 Environments of biological networks as abstract contexts 

Most studies of biological networks focus on one type of variable in isolation. For example, many studies 
focus on one of metabolic networks, protein-protein interaction networks, signalling networks, gene- 
regulatory networks, or population and community dynamics in the context of ecological networks. A 
true biological network involves all of these acting together to produce biological phenomena at all 
scales. Models that integrate information about biological networks, rather than focusing exclusively 
on particular types of molecules, will likely become more common in the near future [8-10]. The Systems 
Biology Graphical Notation (SBGN) supports the ability to express many of these networks within the 
context of a single formalism [11], Fig. 1. Even when the different types of biological variables are 
combined into a single network, it is impossible to study all variables simultaneously. As a result, it 
is always the case that a subnetwork is selected for investigation and the remainder of the network is 
treated as an environment or eontext. In Fig. 1 we show the SBGN process form of six simple examples 
of biological networks. In each case we have selected a subset of variables that form a subnetwork as an 
example of how one might proceed in the investigation of a particular biological system. Once such a 
subnetwork is chosen, it is possible to abstract away the variables that are not part of the subnetwork. 
This is represented by the abstract influence network (AI) for each simple example on the second row of 
Fig. 1. The transformation from SBGN to the AI network is given simply by collapsing the disconnected 
components of the ancestors of each node in the focal subnetwork into single AI nodes. This results in a 
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bipartite graph that captures the dependencies among the environmental factors as experienced by the 
subnetwork and nothing more. 

This AI graph is precisely equivalent to an undirected hypergraph if one considers each of the AI nodes 
as a hyperedge containing all of the nodes to which it connects. This is shown as the SH graph in the 
third row of Fig. 1 for each of the simple examples of the SBGN form of biological networks. Considering 
all possible hypergraphs of this kind is equivalent to examining all possible environmental dependency 
structures the subnetwork could be subjected to. Because the AI is fundamental to understanding how 
subnetworks depend upon their contexts, it is the structure of the AI and equivalent SH graphs that we 
refer to as network architecture throughout the paper. We note from this perspective, that cycles in the 
SBGN representation of the biological network do not result in corresponding cycles in the AI graph and 
vice versa. For instance, in example four of Fig. 1, there are no cycles in the SBGN representation of 
the biological network whereas a single cycle exists in the hypergraph representation of the AI graph. 
Furthermore, in example six, there is a cycle in the SBGN representation, whereas there is no cycle in 
the hypergraph representation of the AI. 

More precisely, the collection of variables comprising the subnetwork under consideration is referred 
to as L. The different subsets, O, of biological variables, L, making up the hypergraph representation of 
the AI are each referred to as modules. A biological network architecture, may then be represented 
by a subset of all possible such modules subject to two conditions (see Supplementary Material Sec. S2). 
The first represents the fact each variable of the focal subnetwork must be included in at least one 
module. The second represents the fact that any pair of constraints that are imposed upon overlapping 
sets of variables must agree on those overlapping variables. In expressing the latter condition, all of the 
information present in a collection of lower-order constraints can be expressed as an effective higher-order 
constraint if any such higher-order constraint exists at all. So, if there is a constraint that is imposed 
simultaneously upon two distinct variables and another independent constraint imposed upon only the 
first of the two variables, this situation can be expressed in terms of a single constraint on both of the 
two variables. 

When there is a relatively larger degree of independence in the network context as compared to the 
subnetwork, it is possible for inconsistency to arise. One canonical example of such inconsistency arises in 
the study of ferromagnetism via the Ising model on a triangular lattice where so-called frustration arises 
in the couplings among the magnetic dipole moments of three nearest-neighbor atomic spins [12-14]. In 
this example, the underlying lattice or graph represents interactions among the spins of atomic nuclei 
according to their spatial proximity. As we have described, in our model, the network architectures to 
which we refer represent the manner in which the network context places constraints upon a subnetwork. 
Inconsistency is likewise capable of arising if there is a cycle in the hypergraph representing this network 
architecture. 


3 Coarse-graining dynamic network states as a generalization 
of genotype-phenotype maps 

Fig. 2 A shows a simplified representation of two different biological networks the correlation strengths 
among whose variables are not known but are to be derived from observation of the levels of the entities 
corresponding to each variable. For example, in the context of a gene-regulatory network, the amount 
of a given transcript present in a cell can be binned into a smaller number of discrete classes by setting 
a collection of thresholds on the original data set. If only a single threshold is given, then the data can 
be binned into two classes depending upon whether or not the original measurement surpasses the given 
threshold in Fig. 2B. The time series that results from such observations can be used to infer various 
statistics that characterize the dynamics of a biological network such as correlations between pairs of 
variables. 
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If a large enough number of thresholds is available to distinguish among all possible counts of the 
variables under investigation, then this observational protocol becomes complementary to mechanistic 
models. There may be several sources for stochasticity in the dynamics including small numbers of the 
causal molecules and products as well as environmental fluctuations upon which these dynamics are 
conditioned [15-23]. Regardless of the fundamental nature of biological networks with respect to their 
potential stochasticity, empirical observations are usually regarded in a statistical manner, and thus we 
focus here on stochastic models. Mathematically, such a model may take the form of a Markov chain 
whose dynamics are governed by a master equation for probability distributions over molecule counts. 
For example, in the case of a three variable network, the master equation takes the form 


dP{ni,n2,n3) 

dt 


E E E n' ’^2,4) 


where P(ni, 722 ,^ 3 ) gives the probability of observing ni, 77 . 2 , and 77.3 molecules of each of the three 
variables respectively and M{k) is a Markov transition rate matrix that depends upon some rate functions 
k that are determined by the network architecture and the dynamics of the interactions. The solution 
to this equation will converge towards a stationary distribution in the limit of long times. Any 
environmental variable having a characteristic timescale longer than that of the variables in the focal 
subnetwork would not be sensitive to transients and would only exhibit control over or be influenced by 
this stationary distribution. 

Interactions between variables may be mediated by a coarse-graining over counts of each variable 
using a function that maps the states representing molecule counts as vectors of natural numbers into 
some other variables. For example, if rii are natural numbers then a function / taking any number less 
than or equal to some threshold T to 0 and any number greater than T to 1 is a very simple example 
of such a coarse-graining. For this specific form of the coarse-graining function /, the coarse-grained 
stationary probability distribution takes the form 


Peg E E E Pcg{ni,n2,n3), 

nief-Hbi) n 2 e/-i( 62 ) nsef-Hba) 


where 61 , 62,^3 S { 0 , 1 }- It is also possible to consider the case where each variable is coarse-grained 
according to a different threshold and into a different number of classes. An abstract algebraic formulation 
of the coarse-graining process is provided in Supplementary Material Sec. S4. 

The most familiar example of such a coarse-graining process in biology is the genotype-phenotype 
map. The genotype of an organism has a relatively straightforward definition in terms of the sequence of 
nucleotides comprising its genome. Phenotypes, on the other hand, can be described at different levels of 
organization [24,25]. The concept of phenotype was initially defined at the level of macroscopically observ¬ 
able physical characteristics such as shape, size, color, and various combinations thereof [26]. However, 
since the advent of molecular biology, an example of a lower-level mapping upon which the higher-level 
map from molecular states to macroscopic phenotypes depends is the dynamic phenomenon that can be 
described by measuring the transcription states of all genes comprising an organism’s genome. These 
expression levels of subsets of interacting genes determine which enzymes are produced, thus determining 
the rate at which metabolic reactions proceed. These reaction rates could then be viewed as constituting 
the next level of phenotypes. These in turn determine even higher level phenotypes, ultimately culmi¬ 
nating in macroscopically observable ones where the concept of phenotype was originally introduced. In 
summary, any mapping from the states of an underlying collection of molecules to a higher-level collec¬ 
tive property of those molecules that may result from their interaction can be viewed as a generalization 
of the genotype-phenotype map, where the original conception of the latter corresponds to the special 
case where 1 ) the genes alone are sufficient to determe the higher-level collective property and 2 ) that 
higher-level collective property is observable at the whole-organism level. 
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A more realistic basis upon which to build phenotypes than this outline of the historical trajectory 
contains is one that is not limited to genes alone, but includes all entities constituting a biological 
network. A phenotype must be a function of the levels of, for example, all of the molecular constituents 
that comprise it over time, even if more information is required to fully specify it. The aforementioned 
coarse-grained levels of biological network variables can thus be viewed as collectively determining the 
lowest level in a hierarchy of abstract phenotypes. In what proceeds, we will assume that we have a 
finite set L of variables and a finite set P of coarse-grained levels of each of those variables. These 
levels may have different units, but they can all be mapped into unitless quantities that account for the 
relevant scale of each variable. In general, each variable could take values in a distinct set P^, i G I 
ranging over the variables, whereby P would be required to represent Ui^iPi rather than a monolithic 
valuation set lacking any underlying substucture with respect to the variables under consideration. Then 
a possible state of our biological network is represented by a function e : L ^ P and coarse-graining a 
stationary distribution will lead to a probability distribution on the set of all maps, denoted P^, from 
subnetworks represented by subsets of L to the respective states of the variables that comprise them. 
We will refer to this more fine-grained generalization of the genotype-phenotype map, where arbitrary 
biological networks are substituted for genes and arbitrary networks states are substituted for phenotypes, 
as network-network state maps. 


4 Probability distributions over network modules 


Here we describe examples of probability distributions over network modules. A more general presentation 
is provided in Supplementary Material Sec. S3. As explained in Sec. 2, for a given biological subnetwork, 
the hypergraph representing the dependencies in the network context consists of subsets, O, of the 
variables, P, in the subnetwork. If we consider the case in which we have two variables L = {h^h} 
and there are two values, P = {0,1}, then there are four possible assignments of values to variables 
each of which constitutes a state of the system. We will write the probability of each of these states as 
p'^ls 2 iiidicating that variable vi is assigned value si and variable V 2 is assigned value 52- A probability 
distribution over the states of the system for L is then given by 


r 12 12 12 12 I 12 \ n 12 \ n 12 \ n 12 \ n 12 , 12 , 12 , 12 i i 

{Poo^Poi^Pio^Pii I Poo > O.Pol > O.Pio > O.Pii > O.Poo +Poi PPio PPii = !}• 


( 1 ) 


This imposes the standard conditions that probabilities are positive and sum to one. If we have the 
subset of L given by O = {/i} then a probability distribution over its states is given by 

{pI,pI\pI>0,pI>0,pI+pI=1}. ( 2 ) 


In order to be consistent the distribution expressed in Eq. 1 should be related to that of Eq. 2 via a 
marginalization matrix 

fPoo\ 


pI\ ^ a 1 0 o\ pH 
pI) lo 0 1 1 j pH 


(3) 


\pIV 


5 Compatibility of distributions on network-network state maps 

Here we provide an example of compatibility conditions on network-network state maps. A more general 
mathematical characterization of these constraints is provided in Supplementary Material Sec. S5. When 
one has a non-trivial network architecture (corresponding to the SH hypergraph like those in Eig. 1), there 
will typically be more than one way of obtaining a probability distribution on a set by marginalizing a 
distribution on a larger set. Eor instance, if we have a network with three binary variables and two edges. 
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{h^h} and {/i, /s}, then we can obtain a probability distribution on the set {/i} either by marginalizing 
probabilities defined over {/i, I 2 } as was done above or by marginalizing probabilities defined over {/i, 1^} 
to obtain 



fl 1 0 
[p 0 1 


fpll\ 

Poi 

\plV 


( 4 ) 


For an arbitrary choice of the quantities Poo? • • • ^-Pn^-Poo? • • • 5 -P 11 ? fhere is no reason that these two 
procedures should yield the same answers for pj and p\. If one requires that they do yield the same 
answer, then one must impose consistency conditions. In our example, these conditions are as follows: 

( 5 ) 

(6) 


Poo+Po? = 


pII+pII 


Pio 


■Pn = 


Pw 


-Pn 


More generally, given a hypergraph Q, we will be interested in two types of consistency conditions. 
We will say that a collection of probabilities associated to a hypergraph is locally consistent if, whenever 
two hyperedges share a subset in common, the probabilities for that subset obtained by marginalizing 
the probabilities associated to one of the hyperedges will agree with those obtained by marginalizing the 
probabilities associated to the other hyperedge. In our example above, there were only two hyperedges 
present, so the conditions we exhibited constitute the entirety of the local consistency conditions for 
that hypergraph. We will denote the set of all locally consistent probability distribution associated to a 
hypergraph Q as L(0). 

We will say that a collection of probabilities associated to a hypergraph is globally consistent if there 
exists a joint probability distribution on the totality of variables associated to the hypergraph such 
that the probabilities associated to any hyperedge are marginals of that joint distribution. In terms of 
our example, that would mean that there exist probabilities Pooo?-Pooi? • • • such that the following 
conditions hold: 


fphl\ 


fl 

1 

0 

0 

0 

0 

0 

0\ 


{Phlh\ 

phi 


0 

0 

1 

1 

0 

0 

0 

0 


Phil 

PlO 


0 

0 

0 

0 

1 

1 

0 

0 


„123 

PolO 

Pii 


0 

0 

0 

0 

0 

0 

1 

1 


^123 

Poll 

Poo 


1 

0 

1 

0 

0 

0 

0 

0 


PlOO 

Phi 


0 

1 

0 

1 

0 

0 

0 

0 


Pig? 



0 

0 

0 

0 

1 

0 

1 

0 


Pl?g 



VO 

0 

0 

0 

0 

1 

0 

V 


bn?/ 


We will denote the set of all globally consistent probability distribution associated to a hypergraph Q as 

M(e). 

Because marginalizing from a set of random variables to a smaller set of variables can be accomplished 
by first marginalizing to an intermediate set and then marginalizing from the intermediate set down to the 
smaller set, it follows that global consistency implies local consistency. We will now see what conditions 
are needed in addition to local consistency to ensure global consistency. 

As in our example, we can express marginalization from the set L of all variables down to a hypergraph 
Q in the form v = Gx where x is a vector whose components are probabilities associated to L, is a 
vector whose components are probabilities associated to and G is a suitable matrix. The consistency 
conditions can be expressed in terms of the fundamental spaces (kernel and cokernel) associated to this 
matrix [27]. In order for a vector v to be expressible as Gx for some x, we must satisfy the condition that 
V ' u = 0 for 8i\\ u £ coker(G). In our example, the cokernel of the matrix is spanned by the following two 
row vectors: 


(1 1 0 0 -1 -1 0 0 ) 

(0 0 1 1 0 0 -1 - 1 ) 


(8) 

( 9 ) 










This leads to the conditions 
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Pm+P'oi-Pll-Pll = 0 
P^o+Pn-Pio-Pn = 0. 


( 10 ) 

( 11 ) 


Note that these are precisely the local consistency conditions which we exhibited earlier. It can be shown 
that the condition that u-v = 0 for oil u G coker(G) will always be exactly the local consistency conditions, 
Supplementary Material Sec. S5. 

To obtain the global consistency conditions, we note that, \i v = Gx, then we also have v = Gy for 
any vector y such that x — y lies in the kernel of G. Choose a subspace T of column vectors which is 
transverse to ker(G) such that the union of T and ker(G) span the column space. Then the equation 

V = Gx has a unique solution if we restrict x to he in T. In order for a column vector to represent a 
legitimate probability distribution, its components must all be non-negative. Hence, we conclude that 

V being globally consistent is equivalent to the following system of equations and inequalities having a 
solution: 

V = Gx 
X eT 

X — y ^ ker(Gj 
^>0 

By using a method, such as Fourier-Motzkin elimination, to remove redundant inequalities, one can 
eliminate the quantities x and y from this system to obtain inequallities involving only the components 
of V. These are the global consistency conditions. 

In our example, ker(G) is spanned by the folllowing two column vectors: 


/ 1 \ 



-1 


0 

-1 


0 

1 


0 

0 


1 

0 


-1 

0 


-1 

\^) 


viy 


As our transverse space T, we will choose the space spanned by the following basis: 


With this choice, the condition x G 
then become 


/1\ 

0 

0 

0 

0 

0 

0 

voy 


/0\ 

1 

0 

0 

0 

0 

0 

voy 


/0\ 

0 

1 

0 

0 

0 

0 

voy 


/0\ 

0 

0 

0 

1 

0 

0 

voy 


/0\ 

0 

0 

0 

0 

1 

0 

voy 


/0\ 

0 

0 

0 

0 

0 

1 

voy 


(14) 


T reduces to the equations x^ = xg = 0. The conditions x—yE ker(G) 


yi-xi=X2-y2 = xg- ys 
y5-xs = XQ-yQ = X7- yj 


Va 

ys 


(15) 

(16) 
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If we solve these for the x’s, substitute the result into the equation v = Gx and eliminate the y’s between 
the resulting equations and the inequalities ^ > 0, we find the conditions u > 0. This, of course, is 
just the condition that the probabilities be positive. Thus, for the case of this simple hypergraph, local 
consistency suffices to ensure global consistency. In Sec. 6, we will see that this is not always the case 
and that the inequalities obtained by elimination impose more conditions on the probabilities than just 
positivity. 


6 Example of unsatisfiable constraints 


We will now exemplify equations and inequalities that need to be satisfied in order to guarantee the consis¬ 
tency conditions for the case of three variables that form the simplest nontrivial cycle where inconsistency 
may arise. Suppose that L = P = {0,1}, Q = {{^ 1 ,^ 2 }, {^ 2 ,^ 3 }, {^3,^i}}- 

Local consistency means that the probability for the variable li to be associated to a given state is 
equivalent in case we marginalize over all the other variables contained in the biological network modules 
of which /i is a component. Mathematically, this reduces to two equations corresponding to the cases 
when the state of /i is 0 or 1. If we do likewise with I 2 and Is in place of li we obtain the set of local 
consistency conditions: 


Pol +P 01 =Po =Poo +P 01 . 
Pil +P 11 =Pi =Pio ^Pil 


12 , 12 2 23 , 23 

Poo^Pio =Po =Poo +P 01 . 

12 I 12 2 23 I 23 

Poi +P 11 =Pi =Pio ffiffii. 


Poo^Pio =Po =Poo +P10. 

pII^pII=pI=pII^pII 


(17) 


These result from applying the method outlined in Sec. 5 to enumerate all local consistency conditions. 
Using the local consistency conditions for our example we can derive a set of inequalities that determine 

HG) 

Poo = 1 + Pii - Pio - Pii - Pio - Pii > 0 , 


Poi = -Pii + Pio + Pii > 0, 

pIo = -pu+pII+pII > 0 , 

Poo = l-P?o-Pm-Pn >0, 
Po? = -Pn+Pm+Pn> 0 , 
Poo = l-Pm-Pm-Pn >0, 


(18) 


combined with the trivial inequalities that force all probabilities to be nonnegative. Substituting the 
numbers from Fig. 3A (which are pjg = 0.1, pjf = 0.4, p\q = 0.4, pW = 0.1, pjg = Poi = 0.1, pjo = 
0.1, Pif = 0.4, Poo = 0.4, Poi = 0.1, pIq = 0.1, pff = 0.4) into Eq. 18, demonstrates that the local 
conditions are satisfied. 

The global consistency conditions form an underdetermined system of linear equations for the putative 
global distribution so their solution will assume the form of a linear subspace. The following equations 
arise as a result of eliminating x from the equations determined by the conditions v = Gx, x G T, 
X — y G kerG: 

Poll = Pol - Polo 


Polo = Poo 
Pioo = Poo 


■Pooo 

■Pooo 


Pno=Pm-Poo+Pooo 


pIII = pII 

Pm? = P\l 


■Pol 


■Polo 


■ Poo + Pooo 


Pill = 1 - Poo - Poo - Poo - Pooo 


(19) 
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The remaining condition ^ > 0 from Eq. 12 states that all the probabilities must be positive numbers, 
which is only possible if the putative marginals satisfy suitable inequalities given by 

pill > minipll pIIpII 1-pII- pH - pH)^ 
pHI > max{0, pH-pH, pH-PoH Poo-Pio)^ 

A minimal set of inequalities is then expressed by substituting the equalities from Eq. 18 into the in¬ 
equalities determined by Eq. 20 and eliminating redundancies resulting in 

Pii -Pii +Poi > 0. 

i^pII-pH-pH-pII-pII>o, 

-Pn+P?o+Pn >0, 

-pII^pIUpH>o. 

The inequalities from Eq. 18 and Eq. 21 combined with the nonnegativity inequalities together determine 
the global polytope M(0). Eor the example given in Eig. 3 A, the first of the inequalities in Eq. 21 is 
demonstrated to be unsatisfied in Eq. 22 


0.1 -0.4+ 0.1 ^ 0, 
1 + 0.1 - 0.1-0.1-0.1-0.4 >0, 
-0.1+ 0.1 +0.4 >0, 
-0.1+ 0.4+ 0.1 > 0. 


( 22 ) 


This indicates that data consistent with Eig. 3 A could not derive from the network depicted there. 

7 Cyclic network contexts can impose unsatisfiable constraints 

Each node of the SH graph in Eig. 3 A can be associated to the probability distribution that specifies 
probabilities for each biological variable to be observed in each of the states determined by the coarse- 
graining process described in Sec. 3. Each edge of the graph specifies a joint probability distribution for 
both of the nodes it contains (or connects) to simultaneously take on a given pair of values. Note that 
this does not imply the existence or absence of a physical interaction between the variables represented 
by these two nodes. Together, these probabilities represent constraints that the network context may 
impose upon the network. We assume three variables are observed via all possible pairwise combinations 
and that via the coarse-graining process we have binned the state of each variable into one of two classes. 
Each node of the graph in Eig. 3 A represents a probability distribution over the observation of each 
variable in either of the two states established in the coarse-graining process. Each of the probability 
tables adjacent to each edge in the graph assigns a probability distribution to the set of maps from the 
nodes connected by the edge to all possible combinations of the network states. As these maps take 
collections of biological network variables as input and produce collections of network states as outputs 
we refer to them as network-network state maps and thus to the associated probability distributions as 
probability distributions over network-network state maps. 

Suppose the normalized contingency tables in Eig. 3 A are meant to represent the ostensible structure 
and parameters of a biological process. It is often necessary to attempt to infer the parameters of 
such a model from data under the assumption that the structure of a given network architecture falls 
within the model class defined by a given graph. Eig. 3B represents a case in which a hypothetical 
dataset is consistent with its derivation from a joint probability distribution whereas Eig. 3C represents a 
case of inconsistency where the pairwise distributions are each individually consistent distributions, but, 
together, the three pairwise distributions are not consistent with any joint distribution over the states of 
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all three network variables. This inconsistency is made possible by the fact that the network architecture 
in Fig. 3 A contains a cycle [28-30] and that we have given an ostensible data set leading to the inference 
of parameters that could not possibly derive from a joint probability distribution over all three network 
variables. 

If this situation arises, it indicates some systematic error in the transfer of information whether it 
occurs intrinsically to the system wherein a network has inconsistent constraints placed upon it by its 
network context or as part of the scientific data collection process. In the former case, this can be resolved 
by modifying the inconsistent constraints in such a manner that they become consistent with or without 
modifying the network architecture in doing so. In the latter case, this may result from employing a model 
which 1) takes insufficient account of the network context and 2) relies on coarse-grained observations. 
In either case, the synthetic gene circuit schematized in Fig. S4 serves as one mechanism implementing 
the example presented in Supplementary Material Sec. S5.1. It consists of four genes each of which is 
capable of taking on three different states [31]. However, observing two out of the three states measured 
pairwise from three out of the four genes could result in data that would appear to be inconsistent. Such 
an observation would demonstrate without having to have knowledge of the correct network architecture, 
that the current model is insufficient to represent the underlying process. 

For the case of the architecture in Fig. 3 A, and moreover for any network architecture of any size 
that contains one or more cycles, the possibility of finding a joint distribution over all network variables 
that satisfies all constraints capable of being imposed upon it requires the implicit assumption that the 
structure of the network context can be viewed simultaneously as that of Fig. 2C top and that of Fig. 2C 
bottom. The spaces of probability distributions corresponding to the constraints that can be imposed 
upon the two network acrhitectures contrasted in Fig. 2C are different. We can now apply the process 
described in Sec. 5 to classify the geometries and thus relationships among the spaces of probability 
distributions associated to constraints that can be imposed on all possible network architectures with a 
given number of variables. 

8 Geometry of probabilistic constraints on network states 

The relationships among possible network architectures are given by the lattice, which in this case indi¬ 
cates ordering by subset inclusion, of reduced subsets of biological network variables (i.e. collections of 
subsets of variables where no subset in the collection is a subset of another one. Sec. 2 and Supplementary 
Material Sec. S2). For example. Fig. 4A shows the lattice of reduced subsets of three variables. We are 
only interested in those subsets that contain at least one instance of each variable. Restricting to the 
subsets of variables satisfying this condition corresponds to the region highlighted with a gray background 
in Fig. 4A. Each network architecture corresponds to a different modularization of the network-network 
state maps by the network context. For example. Fig. 4B shows in the same vertical order the different 
maps induced by the three architectures highlighted in green in Fig. 4A. 

We consider those network architectures found lower in the lattice of Fig. 4 A to be of higher modularity 
because each corresponds to the increasing restriction from placing constraints on higher- to placing 
constraints on lower-order correlations among variables. Fig. 4B top corresponds to the least modular 
network architecture because constraints are placed upon correlations among all three variables. Fig. 4B 
middle exhibits an elevated degree of modularity because constraints are placed upon correlations among 
pairs of variables. Similarly, Fig. 4B bottom is even more modular because constraints are placed upon 
each variable individually. 

Each of the network architectures in Eig. 4 A can be associated to a pair of spaces of probability 
distributions over network-network state maps. These correspond to the spaces of globally, M(0), and 
locally, L(0), consistent distributions described in Sec. 5 and Sec. S5. Eig. 4C schematically depicts 
the relationships among the probability distributions associated to the corresponding architectures and 
network-network state maps in Eig. 4B. Eor Eig. 4C top, M{Q) = 1L{Q). The inconsistency noted in the 
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previous section between the architectures Fig. 4B top and middle is a result of the differing geometries 
in Fig. 4C middle. There, the smaller darker gray region, M(0), defined by the inequalities expressed in 
Eq. 18 and Eq. 21 corresponds to the space of probability distributions defined over all possible network- 
network state maps associated to the network architecture in Eig. 4B middle. Similarly, the lighter gray 
region defined by Eq. 18 alone corresponds to L(0) for Eig. 4B middle and thus M{Q) < ]L{Q) in the 
latter case. 


9 Naive likelihood of sampling unsatisfiable constraints 

Relationships between spaces of potential constraints placed upon patterns of network states like that of 
Eig. 4C middle occur for all network architectures defined over any number of variables so long as there 
exists at least one cycle in the corresponding network architecture. Sec. 7. Eor the case of three variables, 
there is only one class of graphs containing a cycle, which is that of Eig. 4B middle. Eor the case of 
four variables there are nine different classes of hypergraphs containing cycles and these nine classes can 
be split into two groups depending upon whether or not the edges of the graphs are each restricted to 
represent correlations among only two variables. Eig. S5 shows the components of the analogous lattice to 
that of Eig. 4A as well as these different classes of network architectures on four variables having cycles. 

Given this larger collection of network architectures with cycles we can assess the relative sizes of the 
spaces M(0) and h{Q) (Eig. 4C middle) of probability distributions over network-network state maps. We 
assess the likelihood of choosing a point in M{Q) at random by computing the ratio of the volume of M{Q) 
(associated to the non-modular network architectures analogous to that of Eig. 4B top with a single edge 
containing all four variables), whose architecture and thus volume is fixed, to that of L(0), whose volume 
varies according to each of the cyclic graphs associated to a network architecture on four variables. 
We refer to this number as the global:local volume ratio or ^ Supplementary 

Material Sec. S5 and Sec. S6). The comparison defined by this ratio is meaningful since L(0), Eq. S23, 
and M(0), Eq. S24 are of the same dimension. In the case where the constraints defining L(0) are 
eliminated, the analog of this volume ratio would be 0 for all Q. This volume ratio determines the a priori 
likelihood of observing inconsistency for a given network architecture. The consistency check involved in 
computing this ratio can be used as a test demonstrating, for those cases exhibiting inconsistency, that 
the model being used is incorrect in the sense that it does not correspond sufficiently to the actual network 
context determining the constraints placed upon the network. Consider the probability of locally versus 
globally consistent observations {p{]L{Q)o) vs p(M(0)o) respectively) separately from the probability of 
locally versus globally consistent models {p{]L{Q)m) vsp(M(0)m) respectively) that accurately reflect the 
underlying process. We can then estimate the probability of having a locally consistent model despite 
obtaining globally consistent observations, p(L(0)m|M(0)o), via a simple application of Bayes’ theorem 

^ =_ pmG)o\UG)M)piHG)M) _ 

P{ [yjM\ [yjo) p(^M{g)o\h{g)M)pmg)M) + p{M{g)o\M{g)M)p{M{g)M)' 

where p(M{Q)o\M.{Q)m) = 1, the volume ratio described above corresponds to p(M{Q)o\^{Q) m)^ and 
one could consider the impact of different prior probabilities, p{]L{Q)m), of having a locally consistent 
model. 

Eig. 5A and B shows the results of computations of this globahlocal volume ratio for fourteen different 
hypergraphs. Eig. 5C and D shows the dimension of the spaces within which these volumes are computed. 
The spaces are equivalent and thus the volume ratio equal to one for graphs lacking cycles (e.g. the 
first three graphs along the x-axis of Eig. 5A). Eor the nine network architectures in Eig. 5 A and B 
containing cycles, the volume ratio is strictly less than one. This quantifies the probability that the 
network architecture depicted along the x-axis will be able to satisfy the constraints that the associated 
network context is capable of placing upon it. 
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10 Potential for unsatisfiable constraints may bias the sampling 
of network architectures by evolutionary processes 

The satisfiability of constraints capable of being placed on the various architectures is logically a function 
of whether or not the network architecture is cyclic or acyclic. For those network architectures containing 
cycles, there are certain functional requirements that can be achieved so long as only local and not global 
consistency is required of them. Once global consistency is imposed as in the structure corresponding 
to the joint correlations among all variables, those functions that were accessible when only local consis¬ 
tency was imposed are unavailable. For acyclic network architectures, there is no difference between the 
satisfiability of locally or globally imposed constraints. Fig. 6 right shows a schematic of one potential 
scenario by which a given cyclic network architecture may be selected against. The black points in the 
center represent an initial condition of a stochastic process that is selected for its ability to achieve one 
of two different stationary distributions represented by the blue and the red points respectively. This 
is equivalent to placing a fitness landscape given by a function whose maximum is located at the given 
points and defined over the relevant space of probability distributions. The network architecture rep¬ 
resented in the top row of Fig. 6 is able to achieve as its stationary distribution any of the constraints 
capable of being imposed upon it that are consistent with its architecture because it is acyclic. On the 
other hand, the network architecture in the bottom row is incapable of achieving certain constraints that 
may be imposed upon it by a network context consistent with its architecture because it is cyclic. 

When selective pressure is induced equivalent to the distribution located at the blue point, or at 
any other point within the dark gray region, either of the architectures are essentially equivalent with 
respect to the statistics of samples from their corresponding probability distributions and they can thus 
be considered as members of an evolutionarily neutral space. On the other hand, selective pressure 
equivalent to the probability distributions located at the red point differentiates between the networks of 
the top and bottom row or equivalently between the network of the bottom row when global consistency 
is imposed versus the same network when only local consistency conditions are imposed. The same 
qualitative relationship holds true for the spaces of probability distributions of all network architectures 
of any size and for any number of different levels in the discrete coarse-graining of network states so long 
as the graph associated to the relevant correlations among variables contains at least one cycle. 

The distinction between cyclic and acyclic network architectures with respect to the ability to have 
unsatisfiable constraints placed upon them is sharp. However, within the class of cyclic network architec¬ 
tures, the likelihood of having unsatisfiable constraints imposed on a given network architecture increases, 
at least approximately, with the number of cycles in the given network architecture (Fig. 5 and Sec. 9). 
This indicates that the strength of selection against network architectures with a larger number of nested 
cycles is likely to be stronger than that against network architectures with a relatively smaller number of 
cycles. Initiating an evolutionary process with a large network containing many nested cycles may then 
result in the elimination of some via any process that can result in cycle breakage until the number of 
nested cycles decreases sufficiently so that the intrinsic strength of selection against cycles reaches equi¬ 
librium with the rate at which new cycles form. One possibility, depending upon the overall relationship 
between these rates, is a hierarchical-modular one where a globally hierarchical network has a number 
of cyclic modules, each of whose size is small relative to the overall size of the network, interspersed 
throughout. 

11 Discussion 

When biological networks are studied, we remove a subnetwork from a larger context [32]. Depending 
upon the scale of the study, the boundary between subnetwork and network context may vary. For 
example, in a relatively small-scale study the subnetwork may consist of a few genes and metabolites where 
the context is comprised of other genes, metabolites, and intracellular structures. For relatively large- 
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scale models attempting to take into account all of the processes comprising a single-celled organism, the 
network context consists of the variables in that organism’s environment. In even larger-scale studies of 
multicellular organisms, populations, or communities the same general principle applies by appropriately 
shifting the boundary between the subnetwork and network context. 

One salient feature applying at any scale is that the structure of the network context plays a crucial 
role in determining whether or not unsatisfiable constraints on the stochastic dynamical patterns of 
network states may arise at all. We note based on previously existing results that mutually incompatible 
constraints are only capable of arising when the network architecture contains a cycle. Moreover, our 
results suggest the likelihood of mutually incompatible constraints arising relative to network architecture 
increases with the number of cycles in that network architecture. An evolutionary process exhibiting 
uniform sampling over the space of network architectures and the space of possible constraints within 
each network architecture, would thus be expected to exhibit a bias toward the breakage of cycles. One 
would not expect such a bias to eliminate the existence of cycles in biological networks. However, it is 
reasonable to expect on the basis of this result a kind of hierarchical modularity: where modules that may 
possess cycles and are small relative to the overall size of the network exist within a globally hierarchical 
network structure. Of course, there are other factors which may contribute to the development of such 
network architectures. 

It will be important in future work to examine this prediction more closely in the context of developing 
bottom-up stochastic process models that allow for the explicit encoding and solution of models of more 
complex biological networks [33,34]. It is possible that the specific dynamics of a given network context 
may lead to apparent access to correlations that are otherwise inaccessible. In the case of gene-regulatory 
networks, this may occur via a form of cis-regulation that enables the breakage of statistical dependence 
in a time-dependent manner Fig. S4. But such a scenario seems much less plausible than the ability 
to resolve inconsistency by breaking cycles in the network architecture. In the long term, the latter 
corresponds to what is observed in hierarchically organized transcription factor networks [21,35-37]. The 
mechanism outlined here is consistent with previous analyses of hierarchical modular gene regulatory 
network architectures [35-41]. 

To contribute to the broader goal of establishing an integrated framework that synthesizes hypothe¬ 
sized intrinsic and extrinsic constraints necessary to understand the functioning and evolution of biological 
systems, here we have traced a path from biological network architecture to network state constraint sat¬ 
isfiability, and, via the impact of network states on higher-level properties culminating in macroscopically 
observable phenotypes, to evolutionary processes. In the particular context of gene-regulatory networks, 
one goal of measuring gene expression at transcriptomic scale is to uncover the structure of the gener¬ 
ative process encoded in the interactions involved, but, so far, even the most sophisticated methods of 
describing them at the mechanistic level are only solvable for extremely simple regulatory network archi¬ 
tectures [33,34]. This fact has, in part, motivated computational biologists to develop a large collection 
of algorithms to infer aspects of this structure [1,42] and experimental biologists to compare networks on 
the basis of their hierarchical and modular architecture [43]. Our model and its framework put forward 
a class of fundamental constraints that may impact the expected structure of biological networks. The 
fact that the satisfiability of the space of possible constraints that can be imposed upon a network is 
dependent upon the structure of the network context provides a mechanism by which natural selection 
may exhibit a fundamental bias in its sampling of biological network architectures. 
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Figure Legends 
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Figure 1. Abstract influence representation of biological networks, (row 1) The systems 
biology graphical notation (SBGN) is capable of representing arbitrary biological networks including 
processes that involve metabolites, signaling molecules, genes, and enzymes [11]. Only a fragment of the 
SBGN language, where all nodes have equivalent types, is indicated here, (row 2) We abstract from the 
SBGN representation of a biological network to a graph representing the abstract influence (AI) graph 
indicating coupling among a subset of the entities present in a biological network, (row 3) For economy 
of representation we use a short hand (SH) hypergraph to denote the AI graph. The topology of the AI 
and SH graphs are equivalent and this is what we refer to as network architecture. 









19 



Figure 2. Coarse-graining of biological network data. (A) SBGN (top) and SH (bottom) 
representation of two different biological networks. (B) Example binary coarse-graining of biological 
network data. For each sample a measurement is taken for all three variables in the focal subnetwork. 
The levels are binned into one of two classes represented by the red—- and blue bars representing 
relatively high and low levels respectively. (C) Heat map representation of coarse-grained data under 
the assumption of two different network architectures. The samples on top and the associated 
measurement structure correspond to the case where constraints are placed on all three variables by a 
single element of the network context (Fig. 6 top row). The bottom represents the case where all three 
pairs are each independently constrained by elements of the network context (Fig. 6 bottom row). 
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Figure 3. Model of inconsistent network state data. (A) An example structured according to 
the bottom row of Fig. 2C. The graph contains three nodes each representing one of the variables 
depicted in Fig. 2 A. The dashed gray line coming from each variable points to the single variable 
marginal distribution depicted in the associated table. The pairwise edge marginal distributions are 
placed along the edges. The highlighted table entries (top) represent the constraint probabilities on the 
network-network state maps represented by the equivalently colored arrows (bottom). The binary 
values representing variable states derive from the coarse-graining process over continuous network state 
data depicted in Fig. 2B. (B) (top-left) Representation of three hundred samples comprising a data set 
consistent with a uniform distribution over all network-network state maps from the model in panel A. 
(top-middle) The joint probability distribution given in the top-left panel. The green bars in the 
bottom three panels represent the marginalization of this joint distribution according to the structure of 
the graph. The yellow bars in the bottom three panels represent the ostensible marginal distributions 
determined via the sum-product algorithm (loopy belief propagation) [44]. (top-right) A schematic 
where the top gray ellipse represents the space of joint probability distributions on three variables and 
the hexagon represents the pairwise marginals within their natural embedding space (see Fig. 4). For 
this data, maximum likelihood estimation (exact) and loopy belief propagation (approximate) yield 
equivalent points within the space of pairwise marginals. (C) Same as B, but with data consistent with 
Fig. 2C bottom, which in the limit of a large amount of data would converge to the ostensible node and 
edge marginal distributions in panel A. For the given data set, maximum likelihood estimation and 
loopy belief propagation yield different points within the natural embedding space of the pairwise 
marginals. 
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Figure 4. Relationship between biological network models and spaces of probability 
distributions. (A) The collection of all possible network architectures over three variables forms a 
lattice represented here by its Hasse diagram. An analogous lattice of network architectures exists for 
any number of variables. The Hasse diagram shows the manner in which network architectures are 
hierarchically related and are thus able to be embedded within one another. (B) Explicit examples of 
network-network state maps over three network architectures from panel A highlighted in green are 
represented as arrows mapping the variables represented as nodes of the graph underlying the network 
architecture into the collection of network state values determined by the coarse-graining chosen in 
Fig. 2B. There is a different collection of possible network-network state maps depending upon the 
structure of the network architecture. (C) Each collection of network-network state maps, one 
representative for each network architecture depicted in panel B, is associated to a space of probability 
distributions defined over it. Moreover, the spaces of probability distributions associated to each graph 
are related via marginalization maps. The top level represents a joint probability distribution (i.e. A 7 : 
the eight-dimensional probability simplex) which can be marginalized to the middle space (i.e. 
the union of three copies of the four-dimensional probability simplex) which in turn can be marginalized 
to the bottom space (i.e. A®^: the union of three copies of the two-dimensional probability simplex). 
The light gray polytope in the middle, L(0), represents the space of distributions consistent with the 
marginalization map from the middle to the bottom. The dark gray polytope, M(0), represents the 
space of probability distributions consistent with marginalization from the top to the middle. 
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Figure 5. Non-modular to modular probability space volume ratio. (A) and (B) show the 
ratio associated to 2-regular and non-2-regular network architectures respectively. The 

(hyper)graph associated to each value of the volume ratio is displayed along the x-axis of each panel. 
(C) and (D) show the natural dimension of the space of probability distributions associated to M.{Q) 
and L(0) for each hypergraph. 
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Figure 6. Constraints imposed on stochastic biological networks and evolutionary 
dynamics by network architecture. Schematic representation of a potential network context (left) 
for each of the hypothetical stationary probability distributions associated to the fitness peak 
established by the blue and red points within the spaces of probability distributions represented on the 
right. Either of the two network architectures represented on the left are capable of achieving the 
stationary distribution over network-network state maps specified by the blue stationary distribution 
associated to a hypothetical fitness peak. On the other hand, only the network architecture from the 
top (and not the bottom) is capable of achieving the red stationary distribution representing an 
alternative potential fitness peak. 
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Supplementary Material 
SI Outline 

In the Supplementary Material we provide a more formal mathematical description of the results we 
make use of in the main text. In Sec. S2 we characterize biological network architectures as a collection 
of subsets, each individually referred to as a module, of network variables that defines a hypergraph over 
those network variables. Sec. S3 provides a functorial description of probability distributions defined over 
such network architectures and the mappings between those network architectures and the states of the 
modules of the network. Sec. S4 characterizes the manner in which a hierarchy of coarse-grained network 
states can be viewed as a refinement of the genotype-phenotype map, where the genotype and phenotype 
correspond to two different levels within this hierarchy, but maps between any two levels are considered 
to define valid coarse-grainings. Sec. S5 provides a sheaf theoretic formulation of the local and global 
consistency conditions that are logically imposed upon probability distributions over collections of such 
maps from some lower- to some higher-level in the hierarchy of coarse-grainings. Sec. S6 complements 
Sec. 5 of the main text providing a detailed example computation of the ratio of volumes between 
the polytopes corresponding to the global and local consistency conditions for the four-cycle network 
architecture. 


52 Biological network architecture 

A module of a biological network is represented by a subset of variables, O C L. A biological network 
architecture, may then be represented by a subset of all possible such modules. This is to say that Q 
is a subset of the set of all subsets of L, ^ C 7^(L), that satisfies the following two conditions 

1 . y^iOi = vjg = L, 

2. If 0^0' ^ g and 0^0' then O = O'. 

The first condition is just a statement that g represents a decomposition of the collection of all variables 
under consideration into subsets and this is why we refer to ^ as a collection of biological network modules. 
The second condition means simply that we will not consider nested subsets and so we will take for our 
O ^ g the biggest O G ^ that is not a subset of some other O' ^ g. The second condition also implies 
that if a given subset of variables O' is compatible in a sense to be explained more precisely in what 
proceeds then any smaller subset of variables O is also compatible. 

Mathematically, the two conditions given above state that ^ is a covering of the set L. This is 
equivalent to being a reduced hypergraph, Sperner family, or clutter over L [28]. Coverings g 

of the space of biological network variables contain the necessary information to make precise what 
we heuristically refer to at other points in this paper as modularity in order to cohere with standard 
terminology in systems biology literature while attempting to submit our own precise interpretation of 
the relatively colloquial concept. 

53 Functorial formulation of probability distributions over net¬ 
work modules 

As stated in Sec. 4, essentially all studies of biological networks consider states of subsets of variables 
that interact either directly or indirectly. We will represent these modules as subsets of L and their states 
as functions from these subsets to P. 
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The power set of L, which we shall denote as V{L), can be regarded as a category [45-48] in which 
the objects are subsets of L and morphisms represent inclusion of a smaller subset into a larger superset 
(i.e. O C O' ^ O ^ O'). 

Before proceeding, we define a few technical terms from the theory of sheaves and presheaves. We 
do not provide all necessary definitions to make use of the theory in more abstract contexts for which 
we direct the reader to [45]. Given L we define a presheaf over it to be a contravariant functor, 
PSh: Set, from the category of subsets of L, V{L)^ to the category of sets. Set. Thus for 

every U G V{L), PSh(U) is a set. 

s G PSh(U) is a local section over U with respect to PSh. A covering of L with respect to V{L) 
is an indexed set {Oi}i^i where Oi ^ L such that Ui^jOi = L. A system of local sections over a 
covering {Oi}i^i is a set of ordered pairs of elements, O^, of the covering and sections, Si G PSh{Oi), 
that comprise a set of the form 

{{Oi,Si)\ieI, SiePSh{Oi)}. 

A system of local sections is globally consistent when there exists s G PSh{L) such that for alH G / 

[PSh{Oi^L)]{s) = Si, 

where, s is called a witness to global consistency. A system of local sections is said to be locally 
compatible when for all i^j G / the following are satisfied 


[PSh{Oij ^ O0](^i) = [PSh{Oij ^ 

where Oij = Oi H Oj. Note that if a system of local sections is globally consistent then it is locally 
compatible. A presheaf is said to be a sheaf when given any locally compatible system of local sections, 
the system of local sections is both globablly consistent and there exists a unique witness. We refer to a 
presheaf that satisfies the existence but not the uniqueness condition as a half-sheaf. 

The sheaf condition is also commonly expressed in terms of an equalizer diagram [46]. PSh is a sheaf 
if beginning with the lattice of inclusions among subsets of network variables 

U Oij, (SI) 

i£l Pj ijelxl 

for any covering {Oi}i^i and applying the PSh functor to Eq. SI results in 

PSh(o) PShipi) 

PSh{L) - II PSh{0,) II PSh{Oij) , (S2) 

iei PShipj)i,jeixi 

where there exists s G PSh{L)^ such that all of the following conditions are satisfied 

1. [PSh{p)]{s) = {s\oAieI}, 

2. for a family 5^ G PSh{Oi): [PSh{pi)]{si) = {si\o,j} and [PSh{pj)]{si) = 

3. s is unique in satisfying conditions 1 and 2 among elements of PSh{L). 

In this notation, if condition 3 is not satisfied, then PSh is a half-sheaf. 

Given a presheaf and an associated covering, we may ask when it is the case that every locally 
compatible system of local sections over the covering is globally consistent. If this is the case, the 
covering is said to be half-sheaf-like because for the presheaves we study there is, in general, more than 
one witness to global consistency. 
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None of the presheaves we work with in this paper are sheaves, except in degenerate cases. We work 
exclusively with presheaves and their coverings. Some coverings are half-sheaf-like. Surprisingly, some 
are not. This is to say that, if a covering is not half-sheaf-like, then not every locally compatible system 
of local sections over the covering is globally consistent [7,48-51]. The latter correspond to network 
architectures containing cycles whereas the former are acyclic. 

A state of a subset of variables, O C L, is an assignment of values in P to each variable in O which 
is a tuple of length \0\ containing elements from P. This correspondence is determined by the presheaf 
functor £ = Hom(—,P). Specifically, this functor may be described as 

£: V{LyPP Set 

O ^ (S3) 

OCO'^{e'^(e'ot) |e'GP°'}, 


where l: O ^ O' is the injection of the subset O into O' (i.e. l{o) = o for all o G O). In this 
case, f is a sheaf, but note that this is not the case for the distribution presheaf, P, considered later. 
For example, if we consider the case in which we have two variables L = {h^h} and there are two 
potential states, P = {0,1}, then £ operates on the lattice of subsets generated by L to give spaces of 
functions containing the possible network-network state maps as exemplified in Fig. SI. For example, 
^ 2 }) = {^005 ^105 ^ 11 } where e^Kli) = 0 and eoi(/ 2 ) = 1- As another example, £{{li}) = {ej, ej} 

where eo(/i) = 0 and = 1. £{{li} ^{ 11 ^ 2 }) is given explicitly by 


e 

e 

e 
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00 

12 

01 

12 
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12 
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l-G en 


l-G en 


I —y 6-1 


i-G- 61 


(S4) 


Next, we introduce extended probability distributions by defining a functor V that will compose with 
£ to convert collections of network-network state maps into probability distributions over them. Given 
a finite set P, define V{S) to be the set of all maps from V{S) to the interval [0,1] which satisfy the 
following two conditions: For all d G P{S)^ we have d{S) G {0,1}.^ For all d G V{S) and all A, P C P, 
we have d{A) + d{B) = d{A U P) + d{A D P). 

Returning to the running example. 


V{£{{hM})) = I Pm > > 0,pII > 0,pII > 0,pII+pII+pII +Pli = 1}, 

= {Po^pI I Po > O-rf > 0,pI+p\ = 1}. 


(S5) 


If S and S' are finite sets, which in our case will usually be sets of network-network state maps given 
by £(0), d G V{S) and d' G V{S') are probability distributions over these spaces, and f: S ^ S' is a 
partial function, we will say that / is compatible with d and d' when, for all X G img(/), we have 


d\X) 


dSjp if d{dom{f)) ^ 0 , 

0 d{dom{f)) = 0. 


(S6) 


In other words, the map / preserves ratios of probabilities of events. In the case where / is a partial 
surjection (img(/) = S"), compatibility completely determines d' in terms of d and thus V may be regarded 

^Normally, we would only have d{S) = 1, but since we want to introduce conditionalization in a coherent way it becomes 
necessary to admit degenerate distributions where d(S) = 0 as well. This simplifies the exposition by not requiring us to 
worry about dividing by zero and having to introduce special cases when dealing with conditional probabilities and partial 
functions. Of course, it also means that we cannot automatically assume that an element of T>(S) can be normalized without 
checking this fact but in our examples, this verification will turn out to be routine and trivial. 
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as a functor from the subcategory of sets with partial surjections as morphisms to transformations on 
probability distributions: 


d' = V{f ){d) = I ^ rf(dom(/)) 


d(dom(/)) ^ 0 , 
(i(dom(/)) = 0 . 


(S7) 


Specifically, when / is a total surjection, this map corresponds to marginalization. For example, in the 
case / = S{{li}C{li,l 2 }) 

V{S{{h}C{h,l2}))-. S{{h}C{h,h}) ^ S{{h}), 


then 

d'i{el}) = Vif)id)i{el}) 


d{f-\{eh})) 


d{{ehlell}) 

1 


= d{{eU})+di{eli})=plt+pli. 


(S9) 

When / is a partial isomorphism, it corresponds to conditionalization. For example, if / is defined such 
that we condition on variable one being in state zero, /i = 0 , 


/: {ei^,ei^} C £{{h,l2}) ^ {/(e^^),/(e^^)} 


(SIO) 


then 


d'i{f{ell)}) =V{f)id)i{f{ell)}) = 


d{f-\{f{ehl)})) 

diiehlell}) 


d{{ehl}) _ Phi 
di{ehl})+d{{ell}) phl+pir 


(Sll) 


Finally, when / is a general partial surjection, it corresponds to a combination of conditionalization and 
marginalization. 

In order to admit the basic tools of linear algebra for the purpose of calculations regarding relationships 
between spaces of probability distributions we explain how they embed into linear spaces. By definition, 
an extended probability distribution p G is an element of We denote the inclusion map as 

emb(S) : D(S) ^ R®. (S 12 ) 


Because a convex combination of two probability distributions is again a probability distribution, the 
image of emb(5') consists of a convex set and the origin point (corresponding to the degenerate zero 
distribution). Furthermore, if n is the number of elements of the set S', this convex set works out to be 
the probability simplex with n vertices, which we denote A^-i. In our example above, P(S({/i,/ 2 })) is 
the tetrahedron A 3 . Since any vector v may be written as 1; = c+p+ — c_p_ where c+, c_ G [0, 00) 

and and are probability distributions, the image of emb(S) spans the vector space For purposes 
of later reference, note that, if / : S ^ S' is a partial surjection, then V extends to a fractional linear 
map, as in Eq. Sll, from to and that, in the special case where / is a total surjection, as in 
Eq. S9, it is in fact a linear map. 


S4 Precise formulation of coarse-graining network states 

As described in Sec. 3 it is also possible to consider network states that derive from coarse-graining lower- 
level network states. Once this is done, one arrives at probability distributions over network modules like 
that introduced in Sec. 4. As a result of this, our conclusions that are formulated in terms of a single 
level of coarse-graining network-network state maps also apply to coarse-graining over multiple levels at 
once despite the fact that the parameters of the relevant probabilistic model are likely to be different. 
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For each subset of variables O G V[L\ let be the set of network states at level i, which 

can be determined from the expression levels of variables in O. Note that (piiO) may be empty if the 
set O does not contain enough variables to determine the values of any network state at level i. When 
Oi C O 2 ^ 'PiL), we have a restriction map : (j)i{02) 0i(Oi). These maps satisfy the consistency 

conditions that is the identity map and that , i.e. tt^ is a functor on (V{L)^ C). 

As stated earlier, we set 0i(O) = and to be the restriction map from to P^k If i < jf, let 

Vtijip) : 0i(O) ^ 4 ^j{0) be the coarse-graining map which describes how higher level network states are 
determined from lower level network states. These maps are all surjections and, for consistency, we will 
require the following conditions: 


1. VtijiO) o VLjk{0) = VLik{0) whenever i < j < k. 


2. ftii{0) is the identity map on (j)i{0). 

3 . If Oi C O 2 e V{L) and i > j, then QijiOi) o o (O 2 ) 


In other words, Q must be suitably functorial in both of its arguments. 

For example, if our lower level network states for a set of variables Oi = {/i, I 2 J 3 , h} are given by a 
set of binary sequences, then the projection of these network states down to the set O 2 = {^ 3 , ^ 4 } followed 
by mapping to the higher level network states x = { 01 , 10 } and y = { 11 } is equivalent to first mapping 
to the higher-level network states X and Y and then projecting down to O 2 shown by the equivalent 
paths from the top-left to the bottom-right in Fig. S 2 A. Of course, there is an equivalent diagram for the 
subset {/i, / 2 , ^ 3 }- 

Since the map Oi^(O) is a surjection from P^ onto 0i(O), we can use it to map our probabilistic 
structures to (j)i{0). Set £i = o £ and Vi = o V. Then we end up with the overall 

relationships summarized in Fig. S3. As a consequence of the consistency conditions the coarse-graining 
maps 0 and f], there is a natural transformation between the functors £i and implying that the 
following diagram commutes 


C //O ^ c 


(O 2 ) 


tox 


t02 


-^ 


£im 


for any O 2 ^ Oi. 

Given a covering Q of the space of biological network variables, we can consider the higher order 
network states associated to the elements of Q. For a suitable choice of cover and a suitable level of 
network states, it may happen that the network states associated to different elements of Q are distinct. 
For instance, in the example of Fig. S2, if we take Q = {Oi, O 2 } where Oi = {/i, l 2 ^ h} and O 2 = {h, h}, 
we have 0i+i(Oi) = {u^v} and 0^+i(O2) = In such a case, if we were to perform one experiment 

which measured the network states {u, v} and another experiment which measured {x, ^}, then the result 
could be understood as examining the covering {Oi, O 2 } at network state level i + 1. 


S5 Sheaf-theoretic formulation of compatibility of distributions 
on network-network state maps 

Given a covering of the space of variables 0, a compatible family for Q with respect to P o f is given by 
a family of distributions V{£{Q)) = {do G V{£{0))\0 G Qj such that for all 0,0' 


do|Ono' = do'|Ono'. 


(S13) 
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This first set of conditions is later referred to as local consistency. The space of all such locally consistent 
distributions for a given covering, is referred to as L(0) where 

HG) = {do G v{S{g)) I (VO,O' G G) do\ono' = d'o\ono'}. (Si4) 

These conditions mean that any two distributions do and do' in the compatible family of distributions 
marginalize to the same disribution over the intersection of O with O'. If these constraints are not 
satisfied, then there is no way to make a consistent assignment of probabilities to the states of even a 
single variable. In this case in order to restore consistency one of the constraints must be eliminated or 
duplication of a variable may allow for the independent satisfaction of both constraints. 

If, moreover, this first condition implies the existence of d G V{£{L)) such that d\0 = do for all 
O ^ Q then the system is said to satisfy the global consistency condition. The space of all such globally 
consistent distributions for a given covering, is referred to as M(0) where 

M{g) = [do G V{£{g)) I {3d) d\0 = do}. (S15) 

In general, the system of equations d\0 = do for all O G ^ is underdetermined and so local consistency 
does not imply global consistency. Local and global consistency are formalized as described in Sec. S3 in 
terms of sheaf theory as applied to the presheaf functors £ and V o £. £ alone turns out to be a sheaf 
because it satisfies the analogous conditions for all possible coverings g of L: for {eo G £{0)\0 G 
such that eoJOi H O 2 = ^OslOi D O 2 there exists a unique e G £{UoeGO) such that eo = e\0 for all 
O G g. By analogy to Eq. S2 this is expressed by applying the same conditions to the equalizer diagram 


eo^ 

^ n £:(0,) — ]J soo- (Si6) 

iei ijeixi 


For V o £ the sheaf condition is not automatically satisfied and it only defines a presheaf. We examine 
the situation more closely to explicitly determine the necessary conditions for global consistency. 

For a cover of the space of variables, 0, we can construct a linear operator, G, representing the 
relationship, R = Wo^g £{0 G L) C £{L) x£(g), between network-network state maps having as domain 
particular network modules given by the O G ^ and those global network-network state maps defined on L. 
We would like to construct the matrix representation of G. In the first factor, £(L) = = {e^ \j G Pl^l}. 

For the second factor, £{g) = Wo^Q ~ G G Pl^l}. So we have two sets of maps, one 

defined on P^ and the other defined on £{0) = P^ for each O ^ g. This yields the method of specifying 
the intended relationship that defines G for all e9 E £(g) and G £(-£) given in Eq. S17. This matrix 
can be viewed as an operator acting via matrix multiplication on distributions 


G:V{£{L)) ^ V{£{g))^ 

d ^ 

OeG 


and thereby taking a global distribution, P(f (P)), defined on network-network state maps whose domain 
is the full set of variables L into the local distributions, V{£{g)), that are defined relative to network 
modules contained in a covering of the space of variables g. G can be specified for all e? G £{g) and 
ef e £{L): 



ei\0 = e9, 

3 ' 1 

otherwise. 


(S17) 


For example, given the covering g = of a set of two variables L = the associated 

matrix G is shown in Fig. SIB. G provides a way of determining the distributions on network-network 
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state maps for a given context (i.e. Uog^^I^) derived from distributions (i.e. V{£{L))) 

defined on the global network-network state maps (i.e. £{L) as opposed to £(0)). 

Having expressed the relationship between global and local network-network state maps in terms of 
G we now make use of sheaf theory in order to extract the global consistency conditions. Given Eq. S16 
and the associated conditions making £ a sheaf, given by 

Hi 

r£(L) R£(Oi) ^ 0 K£(Oy) ^ 

iei H 2 ijeixi 


is a half-sheaf, in the sense that it satisfies the first two conditions but not the third uniqueness condition 
given in Sec. S3. It follows from this fact that /cer(Hi — H 2 ) = im{G). Moreover, although V o £ is a 
mere presheaf, it can be embedded into using the map defined in Eq. S12 thereby allowing for the 
expression of consistency conditions on V o £ in terms of linear equations constituting constraints on 
the relevant probabilities. The following diagram demonstrates the relationships between the spaces of 
probability distributions and the linear spaces in which they are embedded: 


R£(a) 


embs(L) 

ViS{L)) 


embs(g) 


(S19) 


The locally, L(0), and globally, M(0), consistent polytopes correspond to the spaces of probability 
distributions satisfying the local and global consistency conditions described above. In terms of the 
diagrams expressing the half-sheaf condition, Eq. S18, and embedding map, Eq. S19, 

M{g) = Q{embsiLm£{L)))) 

UG) = emh£(g){V{£{g))) n - H 2 (M^(^))) = emb£(g^{V{£{g))) n 

As in Eq. S5 


emb£(g){V{£{g))) 



(VO e g) (V?e pl°l) pf > 0, 


(VO eg) pf = i 


?ep|oi 


(S21) 


emb£(L){V{£{L))) 



(V?e pj > 0, 





(S22) 


In general the globally consistent polytope is a proper subspace of the locally consistent one because 
G is not invertible (the maximum entropy principle is commonly used to make an arbitrary choice in 
the face of this underdetermination). To determine explicit conditions on the probabilities we express 
L(0) and M{Q) in terms of the fundamental subspaces associated to the linear map G. In order for 
a vector v to he in G(M^^^^), we must have v = Gx for some x G The cokernel of G gives 

the obstructions to this system v = Gx having a solution. In order to eliminate these obstructions, 
constraints must be imposed on and these constraints are given precisely via annihilating the 

cokernel, i.e. = {v \ G cokerG) u • v = 0}. We then take the appropriate intersection to 

determine 1L{Q) by requiring v G V{£{Q)) 


]L{Q) = {-i; G V{£{Q)) \ i^u G cokerG) u • v = Qi}. (S23) 


Since G is not invertible the equation v = Gx can only be solved up to an element of kerG. v = Gx can 
thus be solved on a subspace T of such that T 0 kerG = to yield 


M{Q) = {v \v = Gx, {3x G T) {3y G V{£{L))) x — y ^ kerG}. 


(S24) 











If the embedding into linear spaces is to be considered explicitly, then Eq. S21 and Eq. S22 can be 
substituted for V{£{Q)) and V{£{L)) in Eq. S23 and Eq. S24. In order to obtain inequalities that 
define M(0), Eourier-Motzkin elimination can be used to eliminate x and y. Alternatively one can use 
the fact, [30] proposition 8.3, that M.{Q) is given by removing the non-integer vertices from a vertex 
representation of L(0) and the ability to interconvert between vertex and inequality representations to 
compute the same inequalities as described in Supplementary Material Sec. S6. 


S5.1 Example of apparent satisfaction of unsatisfiable constraints 

The inequalities defining M.{Q) were derived under the assumption that the two-element probabilities 
were obtained by mariginalizing a three-element distribution. If some other procedure, such as condition- 
alization, is used to obtain them instead, these inequalities need not apply. Eor example, suppose now 
that L = {/i, / 2 , ^ 3 }, P = {0,1, 2}, Q = {{/i, h}^ {^ 25 ^ 3 }, {^ 3 , ^ 1 }} where we have simply added an element 
to P relative to the example described above. In the previous example the marginal maps were given by 
V{£{0 C L)) with one for each O ^ Q. If we combine these marginal maps with conditioning on one 
out of the three variables being in state two and each of the other two being in states zero or one, then 
we have instead where tti = C L)\{e\‘j2 \i^j ^ {0,1}}, ^2 = £{{l2M} c 

L)\{e^^ I hi ^ {O^l}}? 'tts = £{{h,h} C L)\{e]2^ \ i,j e {0,1}}. In this case, if we have the following 
assignment of probabilities for a distribution d 


P002 — 1/30 PJ20 — 2/15 P200 — 2/15 

pill =2/15 pill = 1/30 p^l = 1/30 

P 102 — 2/15 pill = 1/30 P 210 = 1/30 

pill = 1/30 pill = 2/15 pill =2/15 


with all other probabilities being zero, then V{T:i){d)^V{'K 2 ){d)^V{'K^){d) are equivalent to the probability 
tables in Eig. 3A, which as shown in Sec. 6, could not be achieved by marginalization alone. Eor example, 
given that dom{'Ki) = {eJoi: ^0125 ^ 102 : ^ 112 } then d{dom{'Ki)) = ^ + ^ + ^ + ^ = Substituting this 
factor and the fact that 7ri“^(e}?) = ejj 2 Eq. S7 

then renormalizes probabilities resulting in Pqq = 0.1, pj? = 0-4, pjo = 0-4, pjj =0.1 along with the 
analogs for p?? and pj?, which are precisely equivalent to what appears in Eig. 3A as suggested above. 

If constraints consistent with those of Eig. 3 A are placed on the given network, either the network must 
add another variable in order to satisfy them directly or the network context imposing those constraints 
must coarse-grain the network in a suitable way. In what follows, we argue that the former is much more 
plausible than the latter. This ultimately suggests conditions in which cycle breakage may be selected 
for to relieve inconsistent constraints that can arise when cycles are present. 


S6 Example volume ratio computation for the four-cycle net¬ 
work architecture 

Eor the purposes of this example, we take the full set of variables to be L = Consider 

the case in which each of the network modules under consideration has two variables and we specify the 
covering of the space of variables given by ^ = {{h^h}^ {h^ ^ 4 }, {^ 3 , ^ 2 }, {^ 3 , ^ 4 }}- We will compute h{Q) 
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using the same method which was used for the example of three variables. By analogy with Eq. 17, the 
local consistency conditions now are as follows: 


Po = Poo + 7*01 = Poo + Pot Pi = Pro + Pn = Pro + Pii> 

Po = Poo + Pm = Poo + pit P? = Pm + Pn = P?i + Pn, 

Po = Poo + Pw = Poo + P?o, pI = Pm + Pn = Pm + pH , 

Po = Poo + Pro = Poo + Pro, Pi = Pm + Pn = Pm + Pn- 


(S26) 


Likewise, the equations determined by the conditions v = G(a;) which are analogous to the matrix G in 
Fig. SlB are now 


Poo 

Pm 

Poo 

Pm 

„32 

Poo 

T)32 

Poi 

Poo 

T)34 

Poi 


^ Poooo 
^ Pmoo 
^ Pjooo 
^ Pmoo 
^ Pjooo 

^ pilot 

■■ Poooo 
^ pJom 


+ pJmo 
■Pm?m 

-pint 

+ Pmm 
-Pllto 
+ Plllt 

-pint 

+pint 


+ pJom 
■Pmm 

-pint 

+ Pnoo 
■Pmom 
+ Pmm 
■Pmom 
+ Pmm 


+ Pjm 

-pint 

-pint 

+ Pnm 
■Pmo 
+ Pjm 
■Pnoo 
+ Pnm 


PlO 

Pn 

Pio 

Pit 

^32 

PlO 

^32 

Pll 

PlO 

Pll 


■ pIIIo 

-Pnlt 

-pIIw 

■ pint 

-pint 

■- PlOOl 

-pint 

- pIoii 


-pint 

-pint 

^pint 

-pint 

^pint 

-pint 

^pint 


^pint 

-pint 

-pint 

+ Pmo 

-pint 

^piin 

^1234 

-Poiio 

I „1234 

+ Pom 


^pint 

-pint 

-pint 

^pint 

-pint 

^pint 

-pint 

^pinn 


(S27) 


which are displayed in matrix form in Table SI. 

Rather than proceeding to compute L(0) using elimination of inequalities as before, we will instead 
make use of the fact that the extremal points of L(0) happen to be the extremal points of L(0) with integer 
coordinates. This is the approach which was used to compute the volume ratios shown in Fig. 5. More 
specifically, those computations were done using a computer program based on the following algorithm 
which is available via a virtual machine that can be reconstructed using the instructions available on 
github: 


1. Compute (a basis for) the cokernel of G. The cokernel gives the obstructions to the system GX = V 
having a solution. In order to eliminate these obstructions constraints must be imposed on 

and these constraints are given precisely via annihilating the cokernel. 

2. Use the constraints on from step 1 necessary for the system GX = V to have a solution 

to eliminate variables from the system of inequalities V > 0 giving a half-space representation or 
H-representation of the polytope h{Q). This can be used to compute Vol(L(0)). 

3. Compute the vertices of L(0) from the H-representation determined in step 2 giving a vertex 
representation or V-representation of L(0). 

4. Filter the non-integer rational vertices from the collection computed in step 3 to produce a corre¬ 
sponding V-representation of M.{Q) [30] proposition 8.3. 

5. Compute Vol(M(0)) from the V-represention of M(0). 


For standard computations on polytopes, we make use of the standard algorithms incorporated by 
the polymake project [52]. In some cases, the volume computation is too costly to perform exactly. In 
those cases we use the approximation given in [53]. We now return to our example of four variables 
G = {{^ 1 , ^ 2 }, {^ 1 , ^ 4 }, {^ 3 , ^ 2 }, {^ 3 , h}} and P = {0,1} and use it to walk through key components of the 
algorithm. 


10 


The equalities derived by computing the cokernel of the matrix G given in Table SI and adjoining 
rows that enforce the normalization of the marginal distributions are represented as a matrix in Eq. S28. 


-1 

-1 

0 

0 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

o ' 


0 

0 

-1 

-1 

0 

0 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 


-1 

0 

-1 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

0 

0 

0 


0 

-1 

0 

-1 

0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

0 

0 

-1 

0 

-1 

0 

1 

1 

0 

0 

0 


0 

0 

0 

0 

-1 

0 

-1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

( S 28 ) 

-1 

-1 

-1 

-1 

1 

0 

1 

0 

1 

0 

1 

0 

-1 

0 

0 

1 

0 


1 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 


0 

0 

0 

0 

1 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

1 


0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

0 

0 

0 

0 

1 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 



The final column represents the right-hand side of each equality. It turns out all but one of the normal¬ 
ization conditions is linearly dependent with respect to the other equalities and so we can reduce this set 
of 7 + 4 = 11 constraints to the 8 represented again in matrix form in Eq. S29. 


1 0 0 
0 1 0 
0 0 1 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 


-10 0 
1 0 0 

1 0 0 

0 1 0 

0 0 1 

0 0 0 

0 0 0 

0 0 0 


1 1 

0 0 

-1 -1 

1 0 

0 1 

0 0 

0 0 

0 0 


0 0 1 

0 0-1 
0 0 0 

0 0 0 

0 0 0 

1 0 1 

0 1 0 

0 0 0 


1 0 0 

-10 0 
0 0 0 

0 0 1 

0 0-1 
0 0 0 

1 0 0 

0 1 1 


0 

0 

0 

0 

0 

1 

-1 

1 


0 1 

0 0 

0 0 

1 1 

-1 0 

1 1 

-1 0 

1 1 


(S29) 


These equalities can now be substituted into the positivity inequalities necessary to define any space of 
probability distributions. This yields a set of inequalities Eq. S30 that specify an H-representation of 
the polytope L(0). This is the modular polytope, which is a subspace of associated to distributions 
consistent with the linear transformation G 


1 1 -1 

0-10 
0 -1 1 

1 0 -1 

0 0 0 

1 0 0 

0 0 0 

1 0 0 

0 1 0 

0 0 1 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 


-1 -1 -1 

0 1 1 

1 0 0 

0 0 0 

-10 0 
0-10 
0 0-1 
0 0 0 

0 0 0 

0 0 0 

1 0 0 

0 1 0 

0 0 1 

0 0 0 

0 0 0 

0 0 0 


0 0 0 

0 0 0 

0 0 0 

-1 0 -1 

1 0 1 

0 -1 -1 

0 1 1 

-1 -1 -1 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

1 0 0 

0 1 0 

0 0 1 


(S30) 


A row (ao,ai, corresponds to the inequality ao + aixi -h ... + dd^d >= 0. The embedded identity 

matrix has, in this particular case eight, rows that specify the positivity of the variables corresponding 
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to each of the, in this particular case eight, dimensions. Transforming this inequality or H-representation 
to a vertex or V-representation of the modular polytope produces Eq. S31. 


1 

0 

0 

0 

0 

0 

0 

0 

1 " 

1 

0 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

1/2 

1/2 

0 

1/2 

0 

1/2 

1/2 

0 

1 

1/2 

0 

1/2 

0 

1/2 

1/2 

1/2 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1/2 

1/2 

0 

0 

1/2 

0 

0 

1/2 

1 

1/2 

0 

1/2 

1/2 

0 

0 

0 

1/2 

1 

0 

0 

1/2 

0 

1/2 

0 

0 

1/2 

1 

0 

1/2 

0 

1/2 

0 

0 

0 

1/2 

1 

0 

1/2 

0 

0 

1/2 

1/2 

1/2 

0 

1 

0 

0 

1/2 

1/2 

0 

1/2 

1/2 

0 

1 

1 

1 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

1 

0 

1 

0 

1 

0 

0 

0 

1 

0 

1 

0 

0 

1 

0 

1 

0 

0 

0 

0 

1 

0 

1 

1 

0 

1 

0 

1 

0 

0 

1 

1 

1 

0 

1 

1 

0 

1 

0 

0 

1 

1 

1 

0 

0 

1 

0 

1 

0 

1 

0 

0 

1 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

1 

0 

0 

1 


This completes steps 1-3 of the algorithm outlined above. Step 4 is trivial; to obtain the V-represention 
of M(0), we strike out the rows in which 1/2 appears. Finally, we compute the volume of the polytope 
whose vertices are the rows of Eq. S31 to obtain Vol(L(0)) = and the volume of the polytope whose 
vertices are rows of integers to obtain Vol(M(0)) = yielding a ratio = |. 
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{h} C {luh} 

u u - 
{} C {h} 



B 

G = {{h},{h}} 



pl2 

^00 

pl2 

^01 

pl2 

^10 

pl2 

^11 

eh 

1 

1 

0 

0 

el 

0 

0 

1 

1 

oto 

1 

0 

1 

0 


0 

1 

0 
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Figure SI. Example of the functor mapping subsets of variables to measurable spaces. (A) 

On the left hand side are subsets of L = {/i, I 2 } ordered by inclusion. On the right hand side are the 
spaces of network-network state maps also ordered by inclusion. The labels for the maps define them. 
For example, eoi(/i) = 0 and eoi(/ 2 ) = 1- (B) For the given covering, Q, the associated marginalization 
matrix acting on the probability vector {Poo^-Poi^-Pio^-Piil {phphphPi} ^ 
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11 
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Figure S2. Example coarse-graining of phenotypes. (A) Consider the example where 
L = {/i, hi h}i Q = {Oi, 02)1 Oi = {/i, h^ h} and O 2 = {hi h}- The top left panel shows two 
higher-level phenotypes X and Y. The bottom left corner shows the five different expression states of 
four genes in L from which these phenotypes are coarse-grained. The right side shows the respective 
projections onto genes {/ 3 ,/ 4 }. The projection maps and are defined in Supplementary 

Material Sec. S4. (B) The different combinations of expression states of genes {/ 3 ,/ 4 } result in two 
different phenotypes. If both genes are expressed metabolite y is produced whereas if only one of the two 
genes is expressed metabolite x is produced. The red and green boxes represent arbitrary promoters. 
















































14 


probabilities of 
oolleotions of expression 
states 


maorosoopio 

phenotypei+1 

(0i+i,incl. 




miorosoopio 

phenotypei (Meas, incL) 
((/)i,incl.) 

allowed/viable \ 

interaotions ^ / 


Figure S3. Mathematical relationships defining the hierarchy of network states via 
coarse-graining. 
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Figure S4. Schematic synthetic gene circuit capable of exhibiting apparent inconsistency. 

A synthetic gene circuit consisting of four genes possesses one gene (purple) whose product is assumed 
to be present at low copy number and binds randomly with equal affinity to operator sites existing 
within the operons of each of the other three genes (red, blue, and green). These latter three genes each 
possess operator sites for the other two, but do not possess autoregulatory operators. They also each 
exhibit three states represented by three dynamical modes that may involve intermediates not explicitly 
represented here [54,55]. If the first gene is bound to the operator of another gene, the output is forced 
into a zero frequency infinite period, or DC, mode (state 2) regardless of the binding state of the other 
operators. If the first gene is unbound, then the expression state can be switched between low (state 0) 
and high (state 1) frequency modes depending upon the binding states of other genes as indicated. Note 
that operators for each of genes one to three are insensitive to the DC mode. Observing pairs of genes 
one to three and ignoring the state corresponding to the DC mode can lead to apparent inconsistency. 
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1. 2}. {1, 3.4}, {2, 3, 4} 


H 




D 




{2, 4}. {1. 4), {1. 2, 3} 




4), {1, 3. 4}. {2, 3} 




Figure S5. Hierarchical relationships among all possible classes of hypergraphs that are 
not graphs (i.e. not 2-uniform) but have cycles. (A) There is a Hasse diagram for the lattice of 
network architectures analogous to that of Fig. 4A but defined on four rather than only three variables. 
Within this lattice some of the graphs have cycles and some do not. (B) The highest levels of the Hasse 
diagram associated to the lattice of network architectures on four variables containing hypergraphs 
having cycles. (C) and (D) contain lower levels of network architectures containing cycles. Each of the 
four panels in (D) are on the same level. In total, each level represents an isomorphism class of 
hypergraphs. Therefore, there are five isomorphism classes of non-2-uniform hypergraphs representing 
network architectures on four variables that contain cycles leading to the relationship between spaces of 
probability distsributions on associated genotype-phentoype maps analogous to that of Fig. 4C. 


















































































































